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Prefix Codes for Power Laws with Countable Support 

Michael B. Baer, Member, IEEE 

Abstract — In prefix coding over an infinite atpliabet, metliods tfiat 
consider specific distributions generafly consider tliose ttiat decline more 
quicldy tfian a power iaw (e.g., Golomb coding). Particuiar power-law 
distributions, however, model many random variables encountered in 
practice. For such random variables, compression performance is judged 
via estimates of expected bits per input symbol. This correspondence 
introduces a family of prefix codes with an eye towards near-optimal 
coding of known distributions. Compression performance is precisely 
estimated for well-known probabiUty distributions using these codes and 
using previously known prefix codes. One application of these near- 
optimal codes is an improved representation of rational numbers. 

Index Terms — Coding of integers, continued fractions, infinite alpha- 
bet, optimal prefix code, power law, rational numbers, search trees. 
Shannon entropy. 

I. INTRODUCTION 
, Consider discrete power-law distributions, those of the form 

p(i) ~ ci~° 

for constants c > and q > 1, where p(j) is the probability 
of symbol i, and f{i) ~ g{i) implies that the ratio of the two 
functions goes to 1 with increasing i. Such distributions could be 
either inherently discrete or discretized versions of continuous power- 
law distributions. 

Several researchers in varied fields have, in classic papers ranging 

' from decades to centuries old, observed power-law behavior for 
various discrete phenomena. These include distribution of wealth 

' [1], [2], town and city populations [2], [3], word frequency [2], 
[4], [5], numbers of species of a given genus [2], [6], and terms 
in continued fractions [7], [8]. More recent papers model various 
Internet phenomena [9]. So active is the topic that several surveys 
and popular expositions exist, e.g., [9]-[ll]. 

However, there has been relatively little work on lossless com- 
pression of symbols obeying such distributions, in spite of a rich 
literature on prefix coding problems [12]. Exponential-Golomb codes 
[13] (generalizations of Elias' 7 code [14]) are a good fit for certain 
power laws [15], [16], leading to their widespread use in compressing 
video and numerical data [15], [17]. To the author's knowledge, 
though, only one specific infinite-cardinality power-law distribution, 
the Gauss-Kuzmin distribution [18, p. 341], has been used to judge 

I compression performance of prefix codes [19], [20]. 

Here we propose simple codes which not only improve upon exist- 
ing codes for encoding symbols distributed according to the Gauss- 
Kuzmin distribution — which applies to coding rational numbers 
using continued fractions — but also efficiently code other common 
distributions, such as the zeta distribution with parameter 2 [21], [22]. 
We estimate compression performance for dozens of code/distribution 
combinations. For fixed codes, these estimates are rigorously shown 
to be precise. 

II. Background, formalization, and motivation 

The most common infinite-alphabet codes are codes that are 
optimal for geometric [23], [24] and geometrically-based [25]-[29] 
distributions. For geometric distributions, these are known as Golomb 
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codes, and are based on the unary code — ones terminated by a zero, 
i.e., a code consisting of codewords the form {l-'O} for j > 0. In 
a Golomb code (Gfc), a unary code prefix precedes a binary code 
suffix. This binary suffix is a complete binary code, in that it has (fc) 
codewords of the same length or length differing by at most one. For 
example, the alphabetic complete binary code of size three that is 
monotonically nonincreasing in length is {0, 10, 11}, so the Golomb 
code G3 is {00,010,011,100,1010,...}. If the complete binary 
code suffix is of constant length, the overall Golomb code is also 
called a Rice code. Rice codes are used in standards such as JPEG- 
LS [30]. Codes that exhibit an efficient coding rate for power laws, 
by contrast, are not known to be optimal (excepting those with finite 
support and trivial examples for dyadic probability mass functions). 

We restrict ourselves to binary codes and assume that the symbols 
to be coded are positive integers. Thus, an infinite-alphabet source 
emits symbols drawn from the alphabet X — {1, 2,3, . . .}. (Some 
applications code the alphabet Xq = {0,1,2,...} or the alphabet 
Xz = {0,-1,1,-2,2,...}, but any code of either form can be 
mapped trivially to a code on X.) Symbol i has probability p{i) > 0, 
forming probability mass function P — {p{i)}. The source symbols 
are coded into binary codewords. The codeword c(i) G {0,1}*, 
corresponding to symbol i, has length n{i) £ Z+, thus defining 
length distribution TV — {n{i)}. An optimal code is one that 
minimizes X^igA' ^'(*)"'(*) ^''■'^ constraint of a corresponding 
code being uniquely decodable, which one is if and only if the 
Kraft inequafity, J2iex < 1' is satisfied. We can assume 

without loss of generality that these codes are prefix codes, that 
is, codes where there are no two codewords of the form c(i) and 
c(j) — c{i)x, where c{i)x denotes the concatenation of strings c(j) 
and (nontrivial) x. (In a similar use of notation, O'' and 1*° denote k 
O's and k I's, respectively. Note also that we use Ig to denote logj 
and In to denote log^, where e is the base of the natural logarithm.) 

One cannot use the Huffman source coding algorithm [31] to find 
an optimal code, as one can for a finite source alphabet. However, it 
is sensible that a code over the integers should be monotonic, that is, 
that n{i) > n{i+l) for all i > 0. An exchange argument easily shows 
that this is necessary for the code to be optimal given a distribution 
for which p(i) > p{i + 1) for all i. 

Also desirable is for a code to be alphabetic or order preserving; 
that is, if c(i, j) is the jth bit of the ith codeword, then c(i + 1, j) < 
c{i,j) only if there is a fc < j' such that c{i + l,k) 7^ c{i,k). 
Alphabetic codes allow the prefix coding tree to be used as a 
decision tree, which is useful for search problems, as in [32], [33]. 
It is also useful for implementation of arithmetic coding: Because 
binary arithmetic coding is much faster than other types of arithmetic 
coding, a decision tree can reduce an infinite-alphabet source into 
a binary source for fast arithmetic coding, as in [15]. In addition, 
order preservation is necessary for the ordered representation of 
rational numbers as integers in continued fractions [19], [20]; in this 
correspondence we improve upon these representations. 

Any valid monotonic prefix code has a (possibly different) alpha- 
betic prefix code with the same length distribution. For example, the 
Elias 7 code was first presented in a nonalphabetic version, then 
transformed into alphabetic form (as a decision tree) in [32]. Where 
there is ambiguity, we will assume use of the alphabetic version of 
a code. 

Another desirable property is one we call "smoothness": 
Definition: We call TV = {n{i)} j-smooth if, for every i > j, if 
n(i + 1) = n{i + 2), then n(i + 1) — n{i) < 1, that is, there are 
no "jumps" followed by "plateaus"; weakly smooth means that it is 
j-smooth for some j. Thus, for any j, a j-smooth code includes all 
weakly smooth codes. Similarly, 0-smooth (or strongly smooth) codes 
include all j-smooth (and thus weakly smooth) codes. Also, we call a 
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P — {p{i)} j-antiunary if, for every i > j, p{i) < p{i + l)+p{i+2); 
antiunary means that it is j-antiunary for some j. 

Observation: No j-antiunary distribution lias an optimal code 
whicii is not j-smootli. Tlius no antiunary distribution has an optimal 
code which is not weakly smooth. 

Proof: Suppose a j-antiunary distribution P has an optimal code 
with lengths A'^ which is not j-smooth. Then there exists an i > j 
such that n(j + l) = n(i + 2) and n(i + l)-n(i) > 1. Consider N' = 
{n'(i)} for which n'(fc) = n{k) except at values n' (i) = n{i) + 1, 
n'{i + 1) = n{i + !)-!, and n' (i + 2) = n{i + 2) - 1. Clearly N' 
satisfies the Kraft inequality and X^i P(*)*^'(*) < ^ 
is not optimal. ■ 

Every power law is antiunary, but most previously proposed codes 
suitable for power-law distributions are not weakly smooth, so they 
could not be optimal solutions. The proof shows that, when such 
codes are applied to antiunary distributions, it is always a simple 
matter to improve such a code for use with such a distribution. 

For many probability distributions, however, there is no guarantee 
that an optimal code would be computationally tractable, let alone 
computationally practical for compression applications. We thus 
judge performance of candidate codes by expected bits per coded 
symbol rather than by strict optimality. One of the contributions of 
this correspondence is a comparison of various codes for well-known 
power-law distributions. 

III. A NEW FAMILY OF CODES FOR INTEGERS 

We propose a family of monotonic, alphabetic, computational 
efficient, 0-smooth codes, starting with the code shown in the center 
set of columns (no(-) and co(-)) of Table H] which is defined as 



'K < 0, 



1,3), 



co(i) 



Ob(j 
Ico ( 



Ico (^) 1, 



i < 4 

i = {4,6,8, 
j = {5,7, 9, 



The term b{j, k) denotes the (j + l)th codeword of a complete binary 
code with k items, which is order-preserving (alphabetic), with the 
first 2^'*^*! - k items having length [IgfcJ and the last 2k - 2^'*^*^ 
items having length [Igfe]. In this case, that means that Co(l) = 
0&(0,3) = 00, co(2) = 06(1,3) = 010, and co(3) = 06(2,3) = 
Oil. Thus, for example, co(12) = lco(5)0 = llco(l)10 = 110010. 
This is a unary code of length m, followed by a binary digit 6 (where 

6 = or 6 = 1), and a binary code of length m + 6 — 1, and is thus 
straightforward to encode, decode, and write in the form of an implicit 
infinite search tree. 

This code, like exponential-Golomb codes, is a modification of the 

7 code. Whereas the 7 code has an m-bit unary code followed by 
a complete binary code for 2™~^ items. Code follows the unary 
prefix by a complete binary suffix for 3-2'"^^ items. Straightforward 
extensions of this can be obtained by modifying the search tree. We 
can add a fc-bit binary number to each possible codeword — as in 
the fourth and fifth set of columns in Table U — extending Code 
in the same manner that Rice codes extend unary codes, that is. 



6((j-l) mod 2^2'=) 



where > and 6 ((i — 1) mod 2*°, 2*^) is the fc-bit representation 
of {i — 1) mod 2*^. Call any of the new extensions Code k. 

Another extension, similar to [15] and [34], involves first coding 
with a finite code tree, then, if this initial codeword is all I's, adding 
Code 0. If we start as in a unary code and switch to Code after k 
ones, then let Code — k denote the implied code, e.g.. Code —1, the 
second set of columns (n_i(-) and c_i(-)) in Table|l] Formally, for 



Cfc(i) = 



i < -k 
<~'''co(i + fc), i>k. 



All codes presented here are 0-smooth (strongly smooth), and can 
be coded and decoded using only additions, subtractions, and shifts 
such that the total number of operations is proportional to the number 
of encoded output bits. 

IV. Application 

Table |ll] lists various distributions for which no optimal code is 
known and estimates, in expected bits per input symbol, of coding 
performance using several different codes. The entropy and the 
expected bits per symbol of an optimal code are also estimated. H 
denotes the entropy of the distribution {H{P) — — X^i P(*) IgP(O) 
and A'^* (the expected codeword length of) the optimal code. Golin 
denotes the best Golin code [35]; Code k denotes the best of the 
codes introduced here; Ji denotes the Levenshtein (JleBeHmTeHH) 
code [36]; 'y/S/u/EGk denotes the best of the Elias codes [14] and 
exponential-Golomb codes [13], which in these examples is always 
the Elias 7 code (EGO); Y denotes Yokoo's code for the Gauss- 
Kuzmin distribution [20] ; and Gk denotes the best Golomb code (with 
parameter k) [23]. These codes are defined in the cited papers and 
the definitions are repeated in the Appendix, which also explains the 
methods by which the estimations of bits per symbol are calculated. 
In cases for which there are multiple codes and/or parameters, the best 
one is chosen and indicated in superscript. Note that, as in previous 
papers on these and similar codes [13], [37], the best code is chosen 
by its empirical performance; there appears to be no simple rule for 
deciding which code to use. 

We show the performance for the overall best fixed code for each 
distribution in bold in Table HH and, if a Golin code is better, this is in 
italics. Note that Golin codes do well for inputs with rapidly declining 
probabilities, whereas Yokoo's code and the codes introduced here 
have the best results for inverse square probability mass functions. 
However, Golin codes, in being calculated on the fly, are often 
impractical, both due to the potential for rounding errors to lead to 
coding errors and due to the computational complexity of the required 
floating point divisions. 

We find that Code —1 is of particular interest as it happens to 
be an excellent code for the Gauss-Kuzmin distribution, defined and 
well-approximated as follows: 

Ige 



p (l) 



(^ + l)2 



This shows how it is a power law. The Gauss-Kuzmin distribution is 
the one for which to code when expressing coefficients of continued 
fractions, as in [19], [38], in which EGO is proposed for use, and [20], 
in which Yokoo's code is proposed. Code —1 is only about 0.008% 
worse than the (approximated) optimal code, whereas Yokoo's code 
is 0.449% worse and the Elias 7 code (EGO) is 1.007% worse. 

Note also that Code —2 is a good code for the zeta distribution 
with parameter s = 2, where the zeta distribution is defined as 

1 



pi{i) 



«C(s) 



and is the Riemann zeta function ({s) = X^i^i for s > 1. 
The zeta distribution is used to model several phenomena including 
language [5]. Optimal codes for the zeta distribution (s — 2) were 
considered in Kato's unpublished manuscript [22]. In this work, the 
optimal codeword lengths for the first ten symbols are shown to lie 
in ranges of two possible values for each codeword (or one for the 
first, which has n(l) = 1). The codeword lengths of Code —2 all lie 
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1 10 1 01 
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10 1 
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1 11 
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1 10 1 10 
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10 1 
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1 10 1 11 
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5 


10 1 1 
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1 1 01 



TABLE I 



Five of the Codes introduced here 



H N* Golin Code k JI -y/S/uj/EGk Y Gfc 



Gauss-Kuzmin 


3.43253 


3.47207 3.50705'.i'2) 


3.472346(-i' 3.77915 3.50705(t) 3.48765 00^^'=) 


a P = 1 
1 p = 1.5 
^ = 2 
I P = 2.5 

p = 3 


2.95215 
2.17073 
1.74685 
1.47629 
1.28665 


2.98136 3M''^> 
2.21571 2.22507^^'^ 
1.83787 1.84024'^^^ 
1.62102 1.62191^^^ 
1.48534 1.48563^^''^^ 


2.983338(-i' 3.17826 3.(t) 2.98138 00^^'=) 
2.230792(-2) 2.32233 2.28020(^) 2.26031 2.85003'^) 
1.848484(-"') 1.91747 1.94200(t) 1.92361 
1.626668(^5) 1.68947 1.74664(t) 1.73044 2.66666 .. .(1' 
1.488172(-6) 1.54608 1.61950(t) 1.60550 1.5(1) 


% s = 2.5 

s = 3 


2.36259 
1.46525 
0.97887 


2.41766 2.43310'-i' 
1.65431 i. 65767(1) 
1.33453 I.33504W 


2.417772("^) 2.53468 2.4463l(^) 2.43042 oo(^'=) 
1.658015(-4) 1.70907 1.73223(t) 1.71963 1.94737 .. .(1) 
1.336680(-4) 1.36956 1.42207(t) 1.41389 1.36843 .. .(1) 



entropy "designer" codes fixed codes 



TABLE II 

Compression (in bits per symbol) and code parameter (where applicable) 



within the allowed ranges. However, we can empirically find better 
codes, showing that Code —2, although the best simply described 
code we know of, is about 0.005% worse than an optimal code. 
A third distribution family is that of Yule [6] and Simon [2], 



YS/ -N 
Pp («) 



pB{i,p+l) 



(p + iV- 



where B{i,j) is the beta function, p > 0, and the right equation 
applies for integer p. Thus, for example, if p = 1, then p{i) = + 
1). Several statistics, from species population to word frequencies, 
have been observed to obey a Yule-Simon distribution, most often 
with parameter p = 1 [2]. This particular distribution is also related 
to continued fractions, being the distribution of the first coefficient 
when the number being represented is chosen uniformly over the 
unit interval (0, 1). For Pi^, Yokoo's code is 0.066% better than 
Code -L 

The estimates in Table |ll] were calculated based on finite sums 
and estimates of the remaining infinite sum. For fixed codes and for 
entropy, these codes are as calculated in the Appendix, and are thus 
accurate to the precision given. The Golin code was estimated based 
on the partial code and conditional entropy of the remaining items. 
Similarly, optimal expected codeword lengths were estimated using 
an optimal code for the partial sum and the entropy of the remaining 
items; although not having the same guaranteed accuracy, the results 
seem to provide accurate estimates based upon the behavior of coding 
truncated probability distributions of increasing size. In [39], it is 
shown that sequences of such truncated distributions always have a 
subsequence converging to the optimal code, providing theoretical 
justification for the use of this technique. Values that are exactly 
calculated from infinite sums, rather than estimated, are indicated 
by the reduced number of figures (for multiples of 0.1) or through 
ellipses in the case of 



2.66666 . 



= -,1.94737. 



COS. 

C(2.5)' 



and 1.36843... = 



C(2) 
C(3)- 



These values are exactly known due to being means of Yule-Simon 
and zeta distributions, which are known in closed form. In addition, 
the average length of the Elias 7 code (EGO) code for a Yule-Simon 
distribution with p = 1 is easily calculated as 



^p(j)n(j 



I+2VJM- 



2J + 1-1 



00 

= l + 2^j2-^-i =3. 

i=o 

Golin's algorithms both result in the same code for this distribution, 
since the algorithms' conditions result in groupings of probabilities 
summing to powers of two. 

Excluding Golin codes, we find that the codes introduced here do 
quite well, only failing to improve upon existing fixed codes in one 
case, the Yule-Simon distribution with parameter p = 1 ipii) = 
+ 1)). Because Yokoo's code requires computing codewords 
for complete binary codes with unequal codeword lengths, however, 
the codewords of codes introduced here require less computation to 
encode and decode. For all tested distributions, Yokoo's code and the 
codes introduced here are both strict improvements on exponential- 
Golomb and Elias codes, confirming that, in practice, strongly smooth 
codes are preferable to those lacking this property. 

Note that not all known codes for integers were tested here; certain 
codes can be ruled out due to the length of the first few codewords 
(e.g., Even-Rodeh [40], Williams-Zobel [41]), whereas others lack the 
alphabetic property and/or have significantly higher computational 
complexity (e.g., Fibonacci [42], [43]). In comparison to feasible 
codes, the codes introduced here are a notable improvement. While 
not optimal, they can be quite useful in practical applications. 
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Appendix 

Consider all codes and probability distributions that are monotonic 
and for which we can find a, /3, k > 0, /x, ^ > 0, r > 0, u > 0, </> > 
such that 



and 



n{i) € [r ln(i + n + l) + a,v ln{i + /u) + /?] 



for large enough i > imin- Then, for x > imin, we have 

^^p{i)n{i) > / p{i)n{i — l)di 
i=x •'^ 

r01n(i + /i) +a 

•fx 



> 
> 



where /min(x-) 

oo 

^p(i)n(i 



/• -di 

{i + 

rlnfi + k) + rf minix) + Q ,, 

^ ,,,1 di 

{i + /t)«+i 

_ T</>ln(a; + min(«:, /x)) + T</>^~^ + 

min(ln(a:: + /^) — ln(a; + k), 0), and 
< / p{i — l)n{i)di 

J X 

v4>\n{x + max(— + v^^^^ + /3 



< 
< 



where fmax{x) — rnax(ln(x + /i)+ln(x— 1), 0), providing upper and 
lower bounds to average codeword length using code N = {n{i)} 
for probability distribution P = {p{i)}. Other distributions (such as 
Golomb codes) and entropy can be bounded similarly. 

Such an approach enables us to find estimates with accuracies 
limited only by the precision of the partial summations (i.e., round-off 
error). For the probability distributions currently under consideration, 
we have: 



Pqk 

pYS 



P 

p5 



p p 

s- 



Ige 

pr(p + i) 

In order to find bounds for expected codeword lengths, we should 
first define the codes we are using. Since we only care about 

codeword lengths, we use code definitions that apply to X and have 
the same lengths N as the (equivalent but possibly different) original 
definitions: 

r 0, i = i 

Elias7 c^{i) = { lc-,(i)0, i = {2, 4, 6, . . .} 
[ lc-,(i^)l, i = {3,5,7,...} 
c^(l + LlgiJ)6(i-2Li8'J,2L'8*J) 



Elias S 
Elias uj 

Ji 

EGk 



csii) 
Cuj{i) 

CJI 
CEGfc(») 



0, 



= 1 
> 1 



f 0, 1 = 1 

\ lcu,{i-l), i > 1 
C7 (1+ L^J)6((i-l)mod 2^2'=) 



Yokoo CYok(*) 



r 0, 

100, 
101, 

13*006(i-2S', mi). 



= 1 
= 2 
= 3 
< Qi 



where c'^{i) is all but the last bit of c^, gt = \gi, rrii = (2*' — 
(-l)''')/3, and qi = 2»' + vm. Recall that h{j, k) denotes the (j + 
l)th codeword of a complete binary code with A; items. 
For these codes, a, /J, /x, r > 0, w > can be 





a 


a 


fJ- 


T 


V 


7, Yokoo 


-1 


1 





21ge 


21ge 


JI 

it > 1) 


2 


2 


-1 


Ige 


2.5 Ige 


Code k 

(k < 0) 
(i > -k) 


ao — k 


-1-k 


2 + k 


21ge 


21ge 



where ao = 1 — 2 Ig 3. (Parameters for 5 codes, ui codes, EGfc 
codes, and Code A; for fc > can be similarly formulated, but these 
are unused here, as the 7 code is clearly better for all distributions 
considered.) 

For finding the best code within code families with multiple codes 
— such as Code k, EGk, and Gk (Golomb code k, defined in the 
main text) — partial sums can be used to limit the number of codes 
tested to a finite number. For example, these codes have n(l) — > cx) as 
k — > +00, so at some point p(l)n(l) will be too large to consider 
Code k with parameters k > fcmax for some /cmax- Similarly, as 
A; —» —00, the unary portion of the code can be used for the partial 
sum. 

Lacking a, 13, ^,t, v. an obvious lower bound for X^i^i 
is but a much more accurate bound can be found 

via entropy bounding with a value of x such that YltZl 2~"'^'^ = 
1 — 2""°= for some yx- For such values, since the code can be assumed 
without loss of generality to be monotonic, the codewords can be 
assumed to be all the leaves of a subtree rooted at depth yx- Since 
any normalized tree is subject to the entropy bound ^ 
H{P), we can normalize to find a useful bound for the overall code. 
Let us first assign 

x — l 00 ^ 

ffx = ^pW, Hx = ^p{i)\g—r 

i=l i=x ^ ' 

^ \ - Cx pM 1- (Tx 

l = X i V / 

where Hx can be lower-bounded by as previously described. Thus, 
applying the entropy boimd to the normaUzed subtree. 



E 



ET=xPij) 



= + iVx + lg(l - (Tx)){l - (Tx) 

This is useful for the codes calculated on the fly, e.g., GoUn's codes. 

Golin's original approach, algl, starts by finding the minimum 
value ki such that 

2'=! Q_ 

Ep(*) > 2 " 0-381966 . . . 

and assigning the first 2*^ inputs code Ob{i — 1, 2*1). The algorithm 
then normalizes the remaining inputs and finds the minimum value 
k2 such that 



l<"Qlb{i-qi,23' -rrii), i>qi 



i=2*'i +1 
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and assigns the next "^"^ inputs code 106(i— 1 — 2'°i , 2*"^). Continuing 
as needed, thie algorithm sequentially finds minimum fc^ (given k\ 
through kh-i) such that 



V^^*''^ V(i) Q 



V5 



where = Y^'^i 2*"^ , and assigns code 



i=i 

to items l + K{h-l) = 1 + Ej=i 2''^ through = j:;^^^^ 

This top-down approach is quite similar to Shannon-Fano coding 
[44], a modification of which results in alg2, previously proposed in 
[45]. In this case, the the grouping condition is not the first kh such 
that S{ki,P) > (3 - \/5)/2, but the kh minimizing 

|S(fc?,P) -0.5| 

that is, the group of a power of two that results in the most even 
division between those grouped and those left ungrouped. (Note that 
Shannon-Fano codes use the overall "best split" whereas these codes 
use the best split that groups items together in powers of two.) 
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